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One of the ways to predict human characters is by using handwritten 
patterns. Graphologists have analyzed handwriting to determine a writer's 
personality by considering several parameters: writing slopes, spacing, 
inclination, and writing size. The results of the analysis have been widely 
used as a reference for psychologists to assess an individual's personality. 
Moreover, researchers have applied techniques to identify human characters 
using image processing techniques. However, different styles of handwriting 
require more research to develop. The process of separating objects from 
backgrounds needs a segmentation process. This research improves the 
quality of handwritten image segmentation using k-means clustering 
algorithms with the spatial filter. This spatial filter consisted of the median 
and mean filters. This research created various k values to gain the best 
segmentation results. The results showed that the median filter with a kernel 
size of 3x3 and the k value = 2 was the best segmentation result because the 
value of silhouette coefficient was the highest compared to the value of filter 


type and other k values which reach 99.22%. 
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1. INTRODUCTION 

Handwriting is the result of human work that has unique characteristic that differs from each other. 
Graphology experts [1] or graphologists analyze human characters and personalities by observing 
handwriting patterns visually or directly [2]. An individual’s behaviour is the reflection of his/her personality. 
Changes or fluctuations in behaviour or emotions may be seen while examining one’s handwriting. Such 
changes in moods can be encountered from an individual’s handwriting as well. Generally, graphologists 
predict human characters by considering handwriting characteristics and investigating writing pressures, 
letter spacing, slope with the baseline parameters, the letter-slant, height of the T-bar, and width of margins. 
The results of this analysis have been utilized in psychology, education, criminology, and medicine [3], [4]. 
Identification of such behavioural traits are possible using graphology and applications in several domains for 
handwriting-based personality have been explored [5]. The improvement of the accuracy requires an 
information technology that can help graphologists interpret handwritten images by using image processing 
techniques [6]. One of the important steps in image processing is the segmentation process that aims to 
separate backgrounds from objects. Human handwriting has different styles and shapes. The quality of 
handwritten images crucially determines the nature of one's personality; thus, good quality of an image is 
necessary [7]. 
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Segmentation is a step to partition an image into different regions by creating boundaries that 
separate areas. At the same time, one of the most frequently used steps is pixel zoning of an image [8]. Image 
segmentation aims to divide an image into segments with similar features or attributes [9]. Edge-based 
segmentation techniques can identify essential things, namely corners, edges, points, and lines in the image. 
However, in some cases, there are pixel categorization errors in the edge-based segmentation category. 
Therefore, one of the edge detection techniques is the segmentation method [10]. Many studies have 
considered k-means to maximize the efficiency of the algorithm [11]. This algorithm performs unattended 
clustering in favor of their inherent distance from each other, which classifies the input data points into 
several categories [12]. Where the iterative algorithm minimizes the number of geometric distances between 
the centroid cluster and each object. This research improved the quality of handwritten image segmentation 
using k-means clustering algorithms with the spatial filter. The k-means algorithm is used for image 
segmentation. The image has been changed to a grayscale image then facilitates cluster extraction. To 
measure the performance of the segmentation results, the silhouette coefficient was calculated [13]. In the 
K-mean clustering algorithm: divide the image into K groups and add points [14]. The silhouette coefficient 
method is a measurement method that combines cohesion with separation to determine the quality of clusters. 
Image segmentation is done by using the k-means clustering method. It is one of the algorithms to classify 
some cluster regions by considering certain characteristics [15]. The Efficient image segmentation and 
implementation of k-means clustering is urgently needed [16]. 

A lot of research on image segmentation uses techniques from conventional and based learning 
methods, where the k-means algorithm is one of the simplest to generate an exciting region [17], [18]. The 
followings are some previous research investigating the processing of handwritten images. Brodowska 
examined handwritten image segmentation and employed several approaches, such as holistic, classic, 
recognition-based segmentation, and mixed approaches. The approaches were selected depending on the 
types of alphabets and characters read [19]. Nath and Rastogi [20] developed some stages in optical character 
recognition (OCR). The segmentation was done using the explicit and implicit approaches. Meanwhile, the 
classification is done by varying some features. Choudhary et al. [21] observed a handwritten image 
segmentation using an object or region selection-based approach that meets the criteria of width above the 
limit values. The extraction was performed on each character. Meanwhile, words with different sizes were 
considered as the background that made noise. Choudhary et al. [22] also developed a new technique for 
vertical segmentation in which the segmentation was performed per pixel after the characters had been 
depleted to get the character sizes. This technique improved the quality of word segmentation on handwritten 
images; thus, the problems in the open character segmentation were minimized. Rani and Kumar [23] studied 
character segmentation. The unfixed size of handwritten characters caused problems in the segmentation. 
Phukan and Borah [24] developed a system to recognize characters and vary processing stages. Features were 
extracted using a feature of moments. The introduction process employed several approaches, such as 
template matching, statistical techniques, structural techniques, and neural networks. Choudhary reviewed a 
segmentation to automatically recognize the handwriting on a static surface. Moreover, Choudhary reviewed 
some segmentation approaches: explicit, implicit, and holistic approaches. He also compared the results of 
previous research that investigated word segmentation [25]. Dave [26] conducted a study using several 
methods for text segmentation on images. This research assisted people to read texts on images based on the 
segmentation area in computer vision. The first step was conducting the segmentation based on the retrieval 
information. Then, factors that affected the segmentation process were varied. The next step was developing 
segmentation levels. The last step was reviewing the employed techniques, advantages, and disadvantages of 
each technique to provide suggestions to develop the next method. Bal and Saha [27] elaborated a system to 
identify human characters by using the image segmentation and a rule-based system method on the slope, 
baseline, and writing thickness. Durga and Deepu [28] evolved a technique to recognize handwriting using 
convolution neural network algorithms on the letters i and t and produced 90% of accuracy. Jindal and Ghosh 
[29] developed the segmentation of words and characters in ancient handwritten Devanagari and Maithili 
documents using horizontal zoning and the accuracy obtained 97.39%. 

In summary, handwritten image segmentation plays a crucial role in improving the accuracy of 
handwriting-based personality analysis. Researchers have employed various techniques, including the 
k-means clustering algorithm, to segment handwritten images and enhance the quality of the analysis. This 
field continues to evolve with advancements in image processing and machine learning techniques. 


2. METHOD 

This study employed data in the form of scanned images of handwriting obtained from 
graphologists. The image processing started with reading image files of handwriting. The type of these image 
files was grayscale. Then, the locations of the handwriting were cropped manually. The next step was 
improving the quality of the images using the spatial filter. The following step was the segmentation using 
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the k-means clustering algorithm to separate the handwriting objects from the backgrounds. The last step was 
the evaluation to determine the performance of the segmentation results. The flow diagram of the image 


Reading Handwritten Images 


processing is presented in Figure 1. 


Cropping 


Repairing the Quality of Images 


Segmenting ( K-Means Clustering) 


| Calculating the Silhoutte 


Figure 1. Flow diagram of the image processing 


3. RESULTS AND DISCUSSION 


3.1. Reading handwriting images 
The input data used in this image processing study consisted of handwritten images that had been 


cropped and converted to grayscale. An example of the input image is visualized in Figure 2, providing a 
clear representation of the initial data used for the segmentation analysis. Grayscale images are particularly 
suitable for various image processing tasks, including clustering and segmentation, as they simplify the data 
representation while retaining essential visual information, allowing for effective analysis and segmentation. 
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Figure 2. Input images 


3.2. Improving the quality of images 
The quality of the images was enhanced through the application of spatial filters, specifically 


median and mean filters. These filters are instrumental in reducing noise and enhancing image clarity. 
Figure 3 displays the image that has undergone quality enhancement using a 3x3 median filter. This filtered 
image showcases the effectiveness of the filter in reducing noise and improving the overall visual quality, 
which is crucial for achieving more accurate and reliable segmentation results in subsequent analysis. Further 
segmentation involved the process of delineating and separating the objects from the background within the 
images. This step is critical for isolating and identifying the specific elements oregions of interest, which is 


often a fundamental objective in image processing and analysis. 
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Figure 3. The image resulted from improving its quality 


3.3. Segmentation 
Segmentation was done using k-means clustering algorithms and varying the k value. The displays 


of binary images with various k values resulted from the segmentation are shown in Figures 4 to 7. K-means 
is a clustering algorithm used in data analysis and image processing. By setting the parameter k=2, this 
algorithm attempts to separate elements in handwriting into two groups based on similarity of attributes or 
features. The segmentation result using k-means with k=2 will divide the elements in handwriting into two 
different groups, often referred to as ‘cluster 1' and 'cluster 2' or 'group 1' and ‘group 2’. 

When using k-means with k=3, the algorithm will aim to divide the elements in handwriting into 
three groups based on their attribute or feature similarities. The segmentation result will yield three distinct 
clusters, often labeled as ‘cluster 1,’ 'cluster 2,’ and ‘cluster 3' or 'group 1,' 'group 2,' and 'group 3.' This level 
of segmentation can be valuable in scenarios where you need finer-grained separation of elements within 
handwritten text, such as distinguishing between letters, numbers, and special characters. 

With k=4 in k-means, the algorithm will work to separate elements in handwriting into four distinct 
groups, each characterized by similarities in their attributes or features. The segmentation outcome will 
present four separate clusters, typically identified as ‘cluster 1,’ 'cluster 2,' ‘cluster 3,' and ‘cluster 4' or 'group 
1,' 'group 2,' 'group 3,' and 'group 4.' This increased level of segmentation can be advantageous when dealing 
with handwritten documents containing a variety of characters or symbols, enhancing the ability to 
differentiate between different types of content. 

With k=4 in k-means, the algorithm will work to separate elements in handwriting into four distinct 
groups, each characterized by similarities in their attributes or features. The segmentation outcome will 
present four separate clusters, typically identified as ‘cluster 1,’ ‘cluster 2,' ‘cluster 3,' and ‘cluster 4' or 
‘group 1,''group 2,' 'group 3,' and 'group 4.' This increased level of segmentation can be advantageous when 
dealing with handwritten documents containing a variety of characters or symbols, enhancing the ability to 
differentiate between different types of content. 
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Figure 4. The result of binary image segmentation with the k value = 2 
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Figure 5. The result of binary image segmentation with the k value = 3 
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Figure 6. The result of binary image segmentation with the k value = 4 
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Figure 7. The result of binary image segmentation with the k value = 5 


3.4. Calculating the silhouette coefficient 

The silhouette coefficients of several images resulted from the segmentation without filters 
arsummarized in Table 1. The silhouette coefficient serves as a metric for assessing the similarity of objects 
within clusters and the dissimilarity between objects in different clusters. Ranging from -1 to 1, positive 
values indicate better similarity within clusters, negative values imply less ideal separation, and values near 0 
denote cluster overlap. When applied to segmentation with different k values, such as k=2, k=3, k=4, and 
k=5, the silhouette coefficient offers insights into the quality of cluster separation for specific data or images. 
In essence, for k=2, it gauges the effectiveness of dividing data into two clusters, while for k=3, it evaluates 
the separation of three clusters, and so on. Comparing these silhouette coefficient values aids in determining 
the optimal number of clusters for segmentation within a given context. 


Table 1. Silhouette coefficients of images resulted from the segmentation without filters 
Silhouette coefficients 

k=2 k=3 k=4 k=5 

text 01 jpg 0.9860 0.9702 0.9707 0.9605 

text 02.jpg 0.9867 0.9870 0.9865 0.9842 

text 03.jpg 0.9806 0.9751 0.9759 0.9736 

text 04.jpg 0.9921 0.9793 0.9876 0.9778 

text 05.jpg 0.9849 0.9771 0.9812 0.9727 


No Names of images 


AWN 


Silhouette coefficients, presented in Tables 2 and 3, offer valuable insights into image segmentation. 
In Table 2, the coefficients are derived from segmenting images using a 3x3 median filter, providing 
evaluations for k=2, k=3, k=4, and k=5. The silhouette coefficient acts as a crucial metric, assessing both the 
similarity within clusters and the distinctions between objects in different clusters. Its range, from -1 to 1, 
signifies positive values for stronger intra-cluster similarity, negative values for less-than-optimal separation, 
and values near 0 for cluster overlap. Essentially, it quantifies the efficacy of dividing data into two clusters 
at k=2, extending to the assessment of three, four, or five clusters' performance as k increases. Table 3, on the 
other hand, showcases silhouette coefficients for images segmented using a 5x5 median filter, again across 
the same range of k values. It's important to note that the choice of a 3x3 or 5x5 median filter can lead to 
differing segmentation characteristics, underscoring the significance of silhouette coefficient comparisons in 
determining the most suitable clustering approach for specific contexts and filter size. 
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Silhouette coefficients of images segmented with a 5x5 median filter are shown in Table 3 for the 
same k values. It should be noted that using 3x3 and 5x5 median filters can produce different segmentation 
characteristics. The silhouette coefficient serves as an important metric for evaluating the extent to which 
objects in a cluster are similar to each other and how well the objects in a cluster are different from others. 
Silhouette coefficient values range from -1 to 1, with positive values indicating a better level of similarity 
within clusters, negative values indicating less than optimal separation, and values close to 0 indicating 
overlap between clusters. Thus, in essence, for the value k=2, this metric helps measure the effectiveness of 
dividing the data into two clusters, while for the values k=3, k=4, and k=5, it provides information about how 
well three, four, or the five clusters function in separating objects in the image. 

Table 2 denote that the highest silhouette coefficients in each image were consistently observed in 
the segmentation with a 3x3 median filter and a k=2. These results highlight the superior clustering 
performance achieved with these specific settings, as indicated by the silhouette coefficients, which reflect 
better intra-cluster similarity and more distinct inter-cluster separation. This suggests that, for the analyzed 
images and segmentation task, utilizing a 3x3 median filter and selecting k=2 yielded the most optimal 
clustering outcome, underscoring the importance of these parameters in image segmentation. 


Table 2. Silhouette coefficients of images resulted from the segmentation with median filter of 3x3 


Silhouette coefficients 
k=2 k=3 k=4 k=5 
text 01 .jpg 0.9879 0.9741 0.9666 0.9696 
text 02.jpg 0.9877 0.9840 0.9840 0.9801 
text 03.jpg 0.9857 0.9754 0.9717 0.9715 
text 04.jpg 0.9922 0.9854 0.9828 0.9784 
text 05.jpg 0.9870 0.9748 0.9817 0.9778 


No Names of images 


MBWNe 


Table 3. Silhouette coefficients of images resulted from the segmentation with median filter of 5x5 


Silhouette coefficients 
k=2 k=3 k=4 k=5 
text 01 .jpg 0.9899 0.9764 0.9599 0.9620 
text 02.jpg 0.9863 0.9840 0.9822 0.9798 
text 03.jpg 0.9881 0.9681 0.9674 0.9641 
text 04.jpg 0.9916 0.9814 0.9785 0.9863 
text 05.jpg 0.9870 0.9794 0.9796 0.9780 


No Names of images 


MBWNe 


The Silhouette coefficients for image segmentation using a 3x3 mean filter and different values of 
k’ in k-means clustering are shown in Table 4. The best clustering result was obtained by using a 3x3 mean 
filter and selecting k=2, highlighting the significance of these parameters in image segmentation. Table 5 
indicates that for the specific segmentation task using a 5x5 mean filter, the Silhouette coefficient for 'k' = 4 
reached a value of 0.9744, which is notably higher than the Silhouette coefficients for other 'k' values. This 
observation strongly suggests that, for this particular set of images and the chosen segmentation method, 
‘k' = 4 is the most suitable choice for achieving the best separation of clusters. Selecting 'k' = 4 in this context 
appears to result in a superior segmentation quality, and it is likely the optimal choice for this image 
segmentation task. 


Table 4. Silhouette coefficients of images resulted from the segmentation with a mean filter of 3x3 


Silhouette coefficients 
k=2 k=3 k=4 k=5 
text 01.jpg 0.9745 0.9686 0.9587 0.9499 
text 02.jpg 0.9811 0.9772 0.9704 0.9772 
text 03.jpg 0.9689 0.9600 0.9496 0.9580 
text 04.jpg 0.9821 0.9779 0.9710 0.9614 
text 05.jpg 0.9717 0.9694 0.9633 0.9662 


No Names of images 


NBWN re 


Table 5. Silhouette coefficients of images resulted from the segmentation with a mean filter of 5x5 


Nov Naneser mass Silhouette coefficients 
k=2 k=3 k=4 k=5 
text 01 .jpg 0.9643 0.9584 0.9605 0.9486 
text 02.jpg 0.9679 0.9622 0.9708 0.9630 
text 03.jpg 0.9480 0.9494 0.9474 0.9364 
text 04.jpg 0.9734 0.9740 0.9744 0.9727 
text 05.jpg 0.9614 0.9570 0.9570 0.9566 


NWN e 
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4. CONCLUSION 

This study investigated handwritten image segmentation using the k-means clustering algorithm. 
Meanwhile, the quality of images was improved using spatial filters. The most significant segmentation 
results were from the segmentation with a median filter of 3x3 and a k=2 which reach 99.22%. The results of 
this study are applicable as a reference to develop an analysis system of handwritten images to identify 
human characters. Further research can implement the gaussian filter to eliminate noise on handwritten 
images and the deep learning method to identify the characters of writers. 
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